[X86][AVX] Recognise out of bounds AVX2 shift amounts #83840

RKSimon · 2024-03-04T13:18:07Z

Pulled out of #39822 which was a bit too general.

Unlike the general ISD SRA/SRL/SHL nodes, the AVX2 vector shift nodes X86ISD VSRAV/VSRLV/VSHLV handle out of bounds shift amounts:

VSRAV clamps the unsigned shift amount to (BITWIDTH-1)
VSRLV/VSHLV returns a zero value for unsigned shift amounts greater than (BITWIDTH-1).

So when lowering vector shifts, we should be able to fold any shift amount clamp patterns and use the X86ISD nodetypes.

e.g.

define <4 x i32> @ashr(<4 x i32> %sh, <4 x i32> %amt) {
  %elt.min.i = tail call <4 x i32> @llvm.umin.v4i32(<4 x i32> %amt, <4 x i32> <i32 31, i32 31, i32 31, i32 31>)
  %shr = ashr <4 x i32> %sh, %elt.min.i
  ret <4 x i32> %shr
}

->

ashr(int vector[4], unsigned int vector[4]):
        vpbroadcastd    xmm2, dword ptr [rip + .LCPI0_0] # xmm2 = [31,31,31,31]
        vpminud xmm1, xmm1, xmm2
        vpsravd xmm0, xmm0, xmm1
        ret

vs

ashr(int vector[4], unsigned int vector[4]):
        vpsravd xmm0, xmm0, xmm1
        ret

Logical shifts are trickier but also foldable:

define <4 x i32> @lshr(<4 x i32> %sh, <4 x i32> %amt) {
  %cmp.i = icmp ult <4 x i32> %amt, <i32 32, i32 32, i32 32, i32 32>
  %shr = lshr <4 x i32> %sh, %amt
  %0 = select <4 x i1> %cmp.i, <4 x i32> %shr, <4 x i32> zeroinitializer
  ret <4 x i32> %0
}

define <4 x i32> @lshr2(<4 x i32> %sh, <4 x i32> %amt) {
  %cmp.i = icmp ult <4 x i32> %amt, <i32 32, i32 32, i32 32, i32 32>
  %0 = select <4 x i1> %cmp.i, <4 x i32> %sh, <4 x i32> zeroinitializer
  %shr = lshr <4 x i32> %0, %amt
  ret <4 x i32> %shr
}

llvmbot · 2024-03-04T13:18:23Z

@llvm/issue-subscribers-backend-x86

Author: Simon Pilgrim (RKSimon)

Pulled out of #39822 which was a bit too general.

Unlike the general ISD SRA/SRL/SHL nodes, the AVX2 vector shift nodes X86ISD VSRAV/VSRLV/VSHLV handle out of bounds shift amounts:

VSRAV clamps the unsigned shift amount to (BITWIDTH-1)
VSRLV/VSHLV returns a zero value for unsigned shift amounts greater than (BITWIDTH-1).

So when lowering vector shifts, we should be able to fold any shift amount clamp patterns and use the X86ISD nodetypes.

e.g.

define &lt;4 x i32&gt; @<!-- -->ashr(&lt;4 x i32&gt; %sh, &lt;4 x i32&gt; %amt) {
  %elt.min.i = tail call &lt;4 x i32&gt; @<!-- -->llvm.umin.v4i32(&lt;4 x i32&gt; %amt, &lt;4 x i32&gt; &lt;i32 31, i32 31, i32 31, i32 31&gt;)
  %shr = ashr &lt;4 x i32&gt; %sh, %elt.min.i
  ret &lt;4 x i32&gt; %shr
}

->

ashr(int vector[4], unsigned int vector[4]):
        vpbroadcastd    xmm2, dword ptr [rip + .LCPI0_0] # xmm2 = [31,31,31,31]
        vpminud xmm1, xmm1, xmm2
        vpsravd xmm0, xmm0, xmm1
        ret

vs

ashr(int vector[4], unsigned int vector[4]):
        vpsravd xmm0, xmm0, xmm1
        ret

Logical shifts are trickier but also foldable:

define &lt;4 x i32&gt; @<!-- -->lshr(&lt;4 x i32&gt; %sh, &lt;4 x i32&gt; %amt) {
  %cmp.i = icmp ult &lt;4 x i32&gt; %amt, &lt;i32 32, i32 32, i32 32, i32 32&gt;
  %shr = lshr &lt;4 x i32&gt; %sh, %amt
  %0 = select &lt;4 x i1&gt; %cmp.i, &lt;4 x i32&gt; %shr, &lt;4 x i32&gt; zeroinitializer
  ret &lt;4 x i32&gt; %0
}

define &lt;4 x i32&gt; @<!-- -->lshr2(&lt;4 x i32&gt; %sh, &lt;4 x i32&gt; %amt) {
  %cmp.i = icmp ult &lt;4 x i32&gt; %amt, &lt;i32 32, i32 32, i32 32, i32 32&gt;
  %0 = select &lt;4 x i1&gt; %cmp.i, &lt;4 x i32&gt; %sh, &lt;4 x i32&gt; zeroinitializer
  %shr = lshr &lt;4 x i32&gt; %0, %amt
  ret &lt;4 x i32&gt; %shr
}

llvmbot · 2024-03-04T13:18:25Z

Hi!

This issue may be a good introductory issue for people new to working on LLVM. If you would like to work on this issue, your first steps are:

In the comments of the issue, request for it to be assigned to you.
Fix the issue locally.
Run the test suite locally. Remember that the subdirectories under test/ create fine-grained testing targets, so you can e.g. use make check-clang-ast to only run Clang's AST tests.
Create a Git commit.
Run git clang-format HEAD~1 to format your changes.
Open a pull request to the upstream repository on GitHub. Detailed instructions can be found in GitHub's documentation.

If you have any further questions about this issue, don't hesitate to ask via a comment in the thread below.

llvmbot · 2024-03-04T13:18:26Z

@llvm/issue-subscribers-good-first-issue

Author: Simon Pilgrim (RKSimon)

Pulled out of #39822 which was a bit too general.

Unlike the general ISD SRA/SRL/SHL nodes, the AVX2 vector shift nodes X86ISD VSRAV/VSRLV/VSHLV handle out of bounds shift amounts:

VSRAV clamps the unsigned shift amount to (BITWIDTH-1)
VSRLV/VSHLV returns a zero value for unsigned shift amounts greater than (BITWIDTH-1).

So when lowering vector shifts, we should be able to fold any shift amount clamp patterns and use the X86ISD nodetypes.

e.g.

define &lt;4 x i32&gt; @<!-- -->ashr(&lt;4 x i32&gt; %sh, &lt;4 x i32&gt; %amt) {
  %elt.min.i = tail call &lt;4 x i32&gt; @<!-- -->llvm.umin.v4i32(&lt;4 x i32&gt; %amt, &lt;4 x i32&gt; &lt;i32 31, i32 31, i32 31, i32 31&gt;)
  %shr = ashr &lt;4 x i32&gt; %sh, %elt.min.i
  ret &lt;4 x i32&gt; %shr
}

->

ashr(int vector[4], unsigned int vector[4]):
        vpbroadcastd    xmm2, dword ptr [rip + .LCPI0_0] # xmm2 = [31,31,31,31]
        vpminud xmm1, xmm1, xmm2
        vpsravd xmm0, xmm0, xmm1
        ret

vs

ashr(int vector[4], unsigned int vector[4]):
        vpsravd xmm0, xmm0, xmm1
        ret

Logical shifts are trickier but also foldable:

define &lt;4 x i32&gt; @<!-- -->lshr(&lt;4 x i32&gt; %sh, &lt;4 x i32&gt; %amt) {
  %cmp.i = icmp ult &lt;4 x i32&gt; %amt, &lt;i32 32, i32 32, i32 32, i32 32&gt;
  %shr = lshr &lt;4 x i32&gt; %sh, %amt
  %0 = select &lt;4 x i1&gt; %cmp.i, &lt;4 x i32&gt; %shr, &lt;4 x i32&gt; zeroinitializer
  ret &lt;4 x i32&gt; %0
}

define &lt;4 x i32&gt; @<!-- -->lshr2(&lt;4 x i32&gt; %sh, &lt;4 x i32&gt; %amt) {
  %cmp.i = icmp ult &lt;4 x i32&gt; %amt, &lt;i32 32, i32 32, i32 32, i32 32&gt;
  %0 = select &lt;4 x i1&gt; %cmp.i, &lt;4 x i32&gt; %sh, &lt;4 x i32&gt; zeroinitializer
  %shr = lshr &lt;4 x i32&gt; %0, %amt
  ret &lt;4 x i32&gt; %shr
}

RKSimon · 2024-03-04T13:21:18Z

https://godbolt.org/z/W1r1c4fzY

SahilPatidar · 2024-03-04T15:02:51Z

@RKSimon,
I'm interested in working on this issue. Could you confirm if all the mentioned behavior regarding the AVX2 vector shift nodes (X86ISD VSRAV/VSRLV/VSHLV) and the handling of out-of-bounds shift amounts occurs in the llvm/lib/Target/X86 directory?

RKSimon · 2024-03-04T15:10:08Z

I expect all of this to be handled as DAG combines inside X86ISelLowering.cpp

SahilPatidar · 2024-03-04T15:47:21Z

What command did you use for this?

SahilPatidar · 2024-03-04T16:10:34Z

@RKSimon,
I am working on it, so could you please assign it to me?

RKSimon · 2024-03-04T16:20:24Z

I suggest you start with the SRA(X,UMIN(Y,BW-1)) case in combineShiftRightArithmetic

You should be able to use supportedVectorVarShift to detect when the target supports replacing the ISD::SRA node with X86ISD::VSRAV

SahilPatidar · 2024-03-04T17:10:27Z

@RKSimon,
IWhile reviewing the X86ISelLowering codebase, I encountered a scenario where the code executes the SRA operation and calls the combineShiftRightArithmetic function. However, when the following condition is met:

if (VT.isVector() || N1.getOpcode() != ISD::Constant ||
    N0.getOpcode() != ISD::SHL || !N0.hasOneUse() ||
    N0.getOperand(1).getOpcode() != ISD::Constant)
  return SDValue();

the code returns without performing any optimization.

My observation is that even though we are working with vectors, the code does not invoke the combineVectorShiftImm function in these cases.

I would like to understand why the code behaves this way and does not call the combineVectorShiftImm function when dealing with vectors. Could you please provide some insights into this behavior?

RKSimon · 2024-03-04T18:16:08Z

The code below that early-out is for a single fold (see the description comment) - just above it is the combineShiftToPMULH fold - you can put your code after combineShiftToPMULH.

SahilPatidar · 2024-03-05T18:35:15Z

@RKSimon
I have been working on a particular piece of code in LLVM and encountered an unexpected behavior that I'm seeking clarification on.

APInt ShiftAmt;
SDNode *NNode = N1.getNode();
if (NNode->getOpcode() == ISD::UMIN &&
    ISD::isConstantSplatVector(NNode->getOperand(1).getNode(), ShiftAmt) && 
    ShiftAmt == VT.getScalarSizeInBits() - 1) {
  SDValue SHAmtVal = NNode->getOperand(0);
  SDLoc DL(N);
  return DAG.getNode(N->getOpcode(), DL, N->getVTList(), N0, SHAmtVal);
}

When running this code, I observed the following unexpected output:

vpsravd %xmm1, %xmm0, %xmm0
retq

I would greatly appreciate it if you could help me understand why the position of xmm is changing in the output.

RKSimon · 2024-03-05T19:50:58Z

The order looks to just be the difference between Intel and AT&T ASM ayntax

SahilPatidar · 2024-03-06T09:25:45Z

@RKSimon, I think some changes need to be made in select and logical shift operations.

RKSimon · 2024-03-06T10:02:10Z

I suggest you just focus on the SRA case first

SahilPatidar · 2024-03-06T10:14:26Z

@RKSimon, could the code I shared above be improved? If it's incorrect, please let me know.

RKSimon · 2024-03-06T11:05:03Z

That SRA code looks fine - I'd like to see it in a PR, and then there is the question of adding test coverage....

SahilPatidar · 2024-03-06T17:57:49Z

@RKSimon,
Do tests need to be changed in combine-sra.ll?

RKSimon · 2024-03-06T17:59:28Z

@RKSimon, Do tests need to be changed in combine-sra.ll?

Yes

… shift nodes llvm#83840

RKSimon · 2024-03-15T16:35:31Z

ISD::SRA handling landed at #84426 / 1865655

SahilPatidar · 2024-03-18T12:04:40Z

@RKSimon, it seems SRL also needs to be addressed. Is that correct?

RKSimon · 2024-03-18T18:09:19Z

Logical shift left/right both need dealing with, but those will be combined from either VSELECT or AND nodes.

SahilPatidar · 2024-03-23T10:55:38Z

Logical shift left/right both need dealing with, but those will be combined from either VSELECT or AND nodes.

could you please provide an example for AND?

RKSimon · 2024-03-24T09:13:11Z

This was the kind of thing I had in mind: https://gcc.godbolt.org/z/rdh9j3ocW

… logical shift nodes llvm#83840

… logical shift nodes llvm#83840 (llvm#86922) Resolve llvm#83840

RKSimon added good first issue https://github.com/llvm/llvm-project/contribute backend:X86 missed-optimization labels Mar 4, 2024

RKSimon assigned SahilPatidar Mar 4, 2024

SahilPatidar added a commit to SahilPatidar/llvm-project that referenced this issue Mar 8, 2024

[X86][AVX] Fix handling of out-of-bounds shift amounts in AVX2 vector…

927da1b

… shift nodes llvm#83840

SahilPatidar mentioned this issue Mar 8, 2024

[X86][AVX] Fix handling of out-of-bounds shift amounts in AVX2 vector shift nodes #84426

Merged

SahilPatidar added a commit to SahilPatidar/llvm-project that referenced this issue Mar 11, 2024

[X86][AVX] Fix handling of out-of-bounds shift amounts in AVX2 vector…

f48f6a5

… shift nodes llvm#83840

SahilPatidar added a commit to SahilPatidar/llvm-project that referenced this issue Mar 11, 2024

[X86][AVX] Fix handling of out-of-bounds shift amounts in AVX2 vector…

8f6f043

… shift nodes llvm#83840

SahilPatidar added a commit to SahilPatidar/llvm-project that referenced this issue Mar 12, 2024

[X86][AVX] Fix handling of out-of-bounds shift amounts in AVX2 vector…

8756269

… shift nodes llvm#83840

SahilPatidar added a commit to SahilPatidar/llvm-project that referenced this issue Mar 12, 2024

[X86][AVX] Fix handling of out-of-bounds shift amounts in AVX2 vector…

40cca1e

… shift nodes llvm#83840

SahilPatidar added a commit to SahilPatidar/llvm-project that referenced this issue Mar 15, 2024

[X86][AVX] Fix handling of out-of-bounds shift amounts in AVX2 vector…

65e07f1

… shift nodes llvm#83840

RKSimon closed this as completed in #84426 Mar 15, 2024

RKSimon reopened this Mar 15, 2024

SahilPatidar added a commit to SahilPatidar/llvm-project that referenced this issue Mar 28, 2024

[X86][AVX] Fix handling of out-of-bounds shift amounts in AVX2 vector…

eaaa9b5

… logical shift nodes llvm#83840

SahilPatidar mentioned this issue Mar 28, 2024

[X86][AVX] Fix handling of out-of-bounds shift amounts in AVX2 vector logical shift nodes #83840 #86922

Merged

SahilPatidar added a commit to SahilPatidar/llvm-project that referenced this issue Apr 8, 2024

[X86][AVX] Fix handling of out-of-bounds shift amounts in AVX2 vector…

aed6d2d

… logical shift nodes llvm#83840

SahilPatidar added a commit to SahilPatidar/llvm-project that referenced this issue May 5, 2024

[X86][AVX] Fix handling of out-of-bounds shift amounts in AVX2 vector…

c11493d

… logical shift nodes llvm#83840

SahilPatidar added a commit to SahilPatidar/llvm-project that referenced this issue Jun 12, 2024

[X86][AVX] Fix handling of out-of-bounds shift amounts in AVX2 vector…

8c4e699

… logical shift nodes llvm#83840

RKSimon closed this as completed in #86922 Jul 12, 2024

RKSimon closed this as completed in c0a5bf8 Jul 12, 2024

aaryanshukla pushed a commit to aaryanshukla/llvm-project that referenced this issue Jul 14, 2024

[X86][AVX] Fix handling of out-of-bounds shift amounts in AVX2 vector…

c544a8e

… logical shift nodes llvm#83840 (llvm#86922) Resolve llvm#83840

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[X86][AVX] Recognise out of bounds AVX2 shift amounts #83840

[X86][AVX] Recognise out of bounds AVX2 shift amounts #83840

RKSimon commented Mar 4, 2024

llvmbot commented Mar 4, 2024

llvmbot commented Mar 4, 2024

llvmbot commented Mar 4, 2024

RKSimon commented Mar 4, 2024

SahilPatidar commented Mar 4, 2024

RKSimon commented Mar 4, 2024

SahilPatidar commented Mar 4, 2024

SahilPatidar commented Mar 4, 2024

RKSimon commented Mar 4, 2024

SahilPatidar commented Mar 4, 2024

RKSimon commented Mar 4, 2024

SahilPatidar commented Mar 5, 2024

RKSimon commented Mar 5, 2024

SahilPatidar commented Mar 6, 2024

RKSimon commented Mar 6, 2024

SahilPatidar commented Mar 6, 2024

RKSimon commented Mar 6, 2024

SahilPatidar commented Mar 6, 2024

RKSimon commented Mar 6, 2024

RKSimon commented Mar 15, 2024

SahilPatidar commented Mar 18, 2024 •

edited

Loading

RKSimon commented Mar 18, 2024

SahilPatidar commented Mar 23, 2024

RKSimon commented Mar 24, 2024

[X86][AVX] Recognise out of bounds AVX2 shift amounts #83840

[X86][AVX] Recognise out of bounds AVX2 shift amounts #83840

Comments

RKSimon commented Mar 4, 2024

llvmbot commented Mar 4, 2024

llvmbot commented Mar 4, 2024

llvmbot commented Mar 4, 2024

RKSimon commented Mar 4, 2024

SahilPatidar commented Mar 4, 2024

RKSimon commented Mar 4, 2024

SahilPatidar commented Mar 4, 2024

SahilPatidar commented Mar 4, 2024

RKSimon commented Mar 4, 2024

SahilPatidar commented Mar 4, 2024

RKSimon commented Mar 4, 2024

SahilPatidar commented Mar 5, 2024

RKSimon commented Mar 5, 2024

SahilPatidar commented Mar 6, 2024

RKSimon commented Mar 6, 2024

SahilPatidar commented Mar 6, 2024

RKSimon commented Mar 6, 2024

SahilPatidar commented Mar 6, 2024

RKSimon commented Mar 6, 2024

RKSimon commented Mar 15, 2024

SahilPatidar commented Mar 18, 2024 • edited Loading

RKSimon commented Mar 18, 2024

SahilPatidar commented Mar 23, 2024

RKSimon commented Mar 24, 2024

SahilPatidar commented Mar 18, 2024 •

edited

Loading