Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PowerPC] Fix vector_shuffle combines when inputs are scalar_to_vector of differing types. #80784

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

amy-kwan
Copy link
Contributor

@amy-kwan amy-kwan commented Feb 6, 2024

This patch fixes the combines for vector_shuffles when either or both of its left
and right hand side inputs are scalar_to_vector nodes.

Previously, when both left and right side inputs are scalar_to_vector nodes, the
current combine could not handle this situation, as the shuffle mask was updated
incorrectly. To temporarily solve this solution, this combine was simply disabled
and not performed.

Now, not only does this patch aim to resolve the previous issue of the incorrect
shuffle mask adjustments respectively, but it also updates any test cases that are
affected by this change.

Patch migrated from https://reviews.llvm.org/D130487.

@amy-kwan
Copy link
Contributor Author

Ping.

2 similar comments
@amy-kwan
Copy link
Contributor Author

amy-kwan commented Mar 2, 2024

Ping.

@amy-kwan
Copy link
Contributor Author

Ping.

@amy-kwan
Copy link
Contributor Author

amy-kwan commented Apr 1, 2024

Ping on this review once again :)

static void fixupShuffleMaskForPermutedSToV(
SmallVectorImpl<int> &ShuffV, int LHSFirstElt, int LHSLastElt,
int RHSFirstElt, int RHSLastElt, int HalfVec, unsigned LHSNumValidElts,
unsigned RHSNumValidElts, const PPCSubtarget &Subtarget) {
for (int i = 0, e = ShuffV.size(); i < e; i++) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, a nit: according to the coding standard, variable names should be uppercase. In addition, preincrement is preferred:

Suggested change
for (int i = 0, e = ShuffV.size(); i < e; i++) {
for (int I = 0, E = ShuffV.size(); I < E; ++I) {

static bool isShuffleMaskInRange(const SmallVectorImpl<int> &ShuffV,
int HalfVec, int LHSLastElementDefined,
int RHSLastElementDefined) {
for (int i : seq<int>(0, ShuffV.size())) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above:

Suggested change
for (int i : seq<int>(0, ShuffV.size())) {
for (int I : seq<int>(0, ShuffV.size())) {

@amy-kwan amy-kwan requested a review from redstar April 5, 2024 15:51
int Idx = ShuffV[I];
if (Idx >= LHSFirstElt && Idx <= LHSLastElt)
ShuffV[I] +=
Subtarget.isLittleEndian() ? HalfVec : HalfVec - LHSNumValidElts;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pull out line the loop , only need to calculate the value once

Subtarget.isLittleEndian() ? HalfVec : HalfVec - LHSNumValidElts; and assign to variable .

Subtarget.isLittleEndian() ? HalfVec : HalfVec - LHSNumValidElts;
if (Idx >= RHSFirstElt && Idx <= RHSLastElt)
ShuffV[I] +=
Subtarget.isLittleEndian() ? HalfVec : HalfVec - RHSNumValidElts;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

// (shuff (s_to_v i32), (bitcast (s_to_v i64), v4i32), ...)
// The last element that comes from the LHS is actually 0, not 3
// because elements 1 and higher of a scalar_to_vector are undefined.
LHSLastElt = LHSScalarSize / (ShuffleEltWidth + 1);
Copy link
Contributor

@diggerlin diggerlin Apr 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am curious why ShuffleEltWidth + 1 in LHSLastElt = LHSScalarSize / (ShuffleEltWidth + 1); , can you explain it ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just for adjustment for the computation to ensure we get the correct last element.

For example,

          t36: i32,ch = load<(load (s32) from %ir.a)> t0, t2, undef:i64
        t37: v4i32 = scalar_to_vector t36
      t38: v16i8 = bitcast t37
            t4: i64,ch = CopyFromReg t0, Register:i64 %1
          t39: i64,ch = load<(load (s64) from %ir.b)> t0, t4, undef:i64
        t40: v2i64 = scalar_to_vector t39
      t41: v16i8 = bitcast t40
    t17: v16i8 = vector_shuffle<0,1,16,17,18,19,20,21,22,23,u,u,u,u,u,u> t38, t41

For this, I believe we should have:

LHSFirstElt = 0
LHSLastElt = 3
RHSFirstElt = 16
RHSLastElt = 23

However, if we did LHSLastElt = LHSScalarSize / (ShuffleEltWidth);, we would have 32 / 8 = 4, which isn't correct. If we did 32 / 8 + 1 = 32 / 9, we would get 3, which is what we expect.

// because elements 1 and higher of a scalar_to_vector are undefined.
// It is also not 4 because the original scalar_to_vector is wider and
// actually contains two i32 elements.
RHSLastElt = RHSScalarSize / (ShuffleEltWidth + 1) + RHSFirstElt;
Copy link
Contributor

@diggerlin diggerlin Apr 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

most of code

if (SToVLHS) {
  ... 
}

as same with

if (SToVRHS) {
... 
}

can you add a helper function or lambda function to simply the code ?

…r of differing types.

This patch fixes the combines for vector_shuffles when either or both of its
left and right hand side inputs are scalar_to_vector nodes.

Previously, when both left and right side inputs are scalar_to_vector nodes,
the current combine could not handle this situation, as the shuffle mask was
updated incorrectly. To temporarily solve this solution, this combine was
simply disabled and not performed.

Now, not only does this patch aim to resolve the previous issue of the incorrect
shuffle mask adjustments respectively, but it also updates any test cases
that are affected by this change.

Patch migrated from https://reviews.llvm.org/D130487.
// entries in the range [RHSFirstElt,RHSLastElt] will be adjusted.
fixupShuffleMaskForPermutedSToV(
ShuffV, LHSFirstElt, LHSLastElt, RHSFirstElt, RHSLastElt, HalfVec,
LHSNumValidElts, RHSNumValidElts, Subtarget);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it seems LHSNumValidElts and RHSNumValidElts are both equal to HalfVec, if so why passing 2 separate parameter for them?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is correct that they are both equal to HalfVec initially, but they can change within generateSToVPermutedForVecShuffle() since one of the parameters to this function is unsigned &NumValidElts so I require them to be separate variables.

HalfVec, ValidLaneWidth, Subtarget);
// entries in the range [RHSFirstElt,RHSLastElt] will be adjusted.
fixupShuffleMaskForPermutedSToV(
ShuffV, LHSFirstElt, LHSLastElt, RHSFirstElt, RHSLastElt, HalfVec,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same question about LHSFirstElt?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although the LHSFirstElt and RHSFirstElt values do not change, I think it would be good to be explicit to mention them here so it's obvious to the reader of the code what values are needed when we need to fix up the shuffle mask (after generating the SCALAR_TO_VECTOR_PERMUTED node).

SToVLHS = DAG.getBitcast(LHS.getValueType(), SToVLHS);
LHS = SToVLHS;
LHS = generateSToVPermutedForVecShuffle(
LHSScalarSize, ShuffleEltWidth, LHSNumValidElts, LHSFirstElt,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems LHSFirstElt is always 0, if that is the case, why it is being passed as a parameter?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the first element of LHS will always be zero. Originally, I didn't have the generateSToVPermutedForVecShuffle() function, but I added it due to the comment I received in 606ea91#r1560041740 in attempts to common up some code.

The first element of the LHS actually doesn't matter when I'm generating the SCALAR_TO_VECTOR_PERMUTED within generateSToVPermutedForVecShuffle(), so I could just pass in zero, but where it matters is at the RHS.

I chose to still pass in LHSFirstElt here just to be explicit in the code that we're dealing with the first element of the LHS.

@amy-kwan amy-kwan requested a review from maryammo April 25, 2024 00:06
@amy-kwan
Copy link
Contributor Author

Ping. :-)

@@ -2505,11 +2505,9 @@ define <2 x i64> @buildi2(i64 %arg, i32 %arg1) {
;
; CHECK-LE-LABEL: buildi2:
; CHECK-LE: # %bb.0: # %entry
; CHECK-LE-NEXT: mtfprd f0, r4
; CHECK-LE-NEXT: mtfprwz f0, r4
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mtfprwz f0, r4 only move word (32bit) from r4 to f0,

but
mtfprd f0 , r4 move word (64bit) from r4 to f0,

is it correct ?

; CHECK-BE-P8-NEXT: mtvsrd v3, r3
; CHECK-BE-P8-NEXT: vmrghh v2, v2, v3
; CHECK-BE-P8-NEXT: mtvsrwz v3, r3
; CHECK-BE-P8-NEXT: addis r3, r2, .LCPI3_0@toc@ha
Copy link
Contributor

@diggerlin diggerlin May 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am curious . there is no read data from TC entry in the left side , but has the read data from TC entry in the right side, the PR look not related to TC entry ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants