New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

[PowerPC] Fix vector_shuffle combines when inputs are scalar_to_vector of differing types. #80784

Open

amy-kwan wants to merge 3 commits into llvm:main from amy-kwan:amy-kwan/vector-shuff-stov

+831 −978

Contributor

amy-kwan commented Feb 6, 2024

This patch fixes the combines for vector_shuffles when either or both of its left
and right hand side inputs are scalar_to_vector nodes.

Previously, when both left and right side inputs are scalar_to_vector nodes, the
current combine could not handle this situation, as the shuffle mask was updated
incorrectly. To temporarily solve this solution, this combine was simply disabled
and not performed.

Now, not only does this patch aim to resolve the previous issue of the incorrect
shuffle mask adjustments respectively, but it also updates any test cases that are
affected by this change.

Patch migrated from https://reviews.llvm.org/D130487.

amy-kwan added the backend:PowerPC label

amy-kwan requested review from maryammo, kamaub, lei137 and stefanp-ibm

February 6, 2024 02:20

amy-kwan self-assigned this

lei137 requested a review from nemanjai

February 6, 2024 13:54

Contributor Author

amy-kwan commented Feb 16, 2024

Ping.

2 similar comments

Contributor Author

amy-kwan commented Mar 2, 2024

Ping.

Contributor Author

amy-kwan commented Mar 18, 2024

Ping.

Contributor Author

amy-kwan commented Apr 1, 2024

Ping on this review once again :)

redstar reviewed

View reviewed changes

llvm/lib/Target/PowerPC/PPCISelLowering.cpp Outdated

+              static void fixupShuffleMaskForPermutedSToV(
+                  SmallVectorImpl<int> &ShuffV, int LHSFirstElt, int LHSLastElt,
+                  int RHSFirstElt, int RHSLastElt, int HalfVec, unsigned LHSNumValidElts,
+                  unsigned RHSNumValidElts, const PPCSubtarget &Subtarget) {
                 for (int i = 0, e = ShuffV.size(); i < e; i++) {

Member

redstar Apr 1, 2024

Sorry, a nit: according to the coding standard, variable names should be uppercase. In addition, preincrement is preferred:

Suggested change

      
              for (int i = 0, e = ShuffV.size(); i < e; i++) {
          
             for (int I = 0, E = ShuffV.size(); I < E; ++I) {

llvm/lib/Target/PowerPC/PPCISelLowering.cpp Outdated

+              static bool isShuffleMaskInRange(const SmallVectorImpl<int> &ShuffV,
+                                               int HalfVec, int LHSLastElementDefined,
+                                               int RHSLastElementDefined) {
+                for (int i : seq<int>(0, ShuffV.size())) {

Member

redstar Apr 1, 2024

Same as above:

Suggested change

      
              for (int i : seq<int>(0, ShuffV.size())) {
          
              for (int I : seq<int>(0, ShuffV.size())) {

amy-kwan requested a review from redstar

April 5, 2024 15:51

diggerlin reviewed

View reviewed changes

llvm/lib/Target/PowerPC/PPCISelLowering.cpp Outdated

    
                  int Idx = ShuffV[I];

                  if (Idx >= LHSFirstElt && Idx <= LHSLastElt)

                    ShuffV[I] +=

                        Subtarget.isLittleEndian() ? HalfVec : HalfVec - LHSNumValidElts;

Contributor

diggerlin Apr 10, 2024

pull out line the loop , only need to calculate the value once

Subtarget.isLittleEndian() ? HalfVec : HalfVec - LHSNumValidElts; and assign to variable .

llvm/lib/Target/PowerPC/PPCISelLowering.cpp Outdated

    
                        Subtarget.isLittleEndian() ? HalfVec : HalfVec - LHSNumValidElts;

                  if (Idx >= RHSFirstElt && Idx <= RHSLastElt)

                    ShuffV[I] +=

                        Subtarget.isLittleEndian() ? HalfVec : HalfVec - RHSNumValidElts;

Contributor

diggerlin Apr 10, 2024

ditto

diggerlin reviewed

View reviewed changes

llvm/lib/Target/PowerPC/PPCISelLowering.cpp Outdated

    
                    // (shuff (s_to_v i32), (bitcast (s_to_v i64), v4i32), ...)

                    // The last element that comes from the LHS is actually 0, not 3

                    // because elements 1 and higher of a scalar_to_vector are undefined.

                    LHSLastElt = LHSScalarSize / (ShuffleEltWidth + 1);

Contributor

diggerlin Apr 10, 2024 •

edited

I am curious why ShuffleEltWidth + 1 in LHSLastElt = LHSScalarSize / (ShuffleEltWidth + 1); , can you explain it ?

Contributor Author

amy-kwan Apr 18, 2024

This is just for adjustment for the computation to ensure we get the correct last element.

For example,

          t36: i32,ch = load<(load (s32) from %ir.a)> t0, t2, undef:i64
        t37: v4i32 = scalar_to_vector t36
      t38: v16i8 = bitcast t37
            t4: i64,ch = CopyFromReg t0, Register:i64 %1
          t39: i64,ch = load<(load (s64) from %ir.b)> t0, t4, undef:i64
        t40: v2i64 = scalar_to_vector t39
      t41: v16i8 = bitcast t40
    t17: v16i8 = vector_shuffle<0,1,16,17,18,19,20,21,22,23,u,u,u,u,u,u> t38, t41

For this, I believe we should have:

LHSFirstElt = 0
LHSLastElt = 3
RHSFirstElt = 16
RHSLastElt = 23

However, if we did LHSLastElt = LHSScalarSize / (ShuffleEltWidth);, we would have 32 / 8 = 4, which isn't correct. If we did 32 / 8 + 1 = 32 / 9, we would get 3, which is what we expect.

llvm/lib/Target/PowerPC/PPCISelLowering.cpp Outdated

    
                    // because elements 1 and higher of a scalar_to_vector are undefined.

                    // It is also not 4 because the original scalar_to_vector is wider and

                    // actually contains two i32 elements.

                    RHSLastElt = RHSScalarSize / (ShuffleEltWidth + 1) + RHSFirstElt;

Contributor

diggerlin Apr 10, 2024 •

edited

most of code

if (SToVLHS) {
  ... 
}

as same with

if (SToVRHS) {
... 
}

can you add a helper function or lambda function to simply the code ?

amy-kwan added 3 commits

April 15, 2024 09:41


          [PowerPC] Fix vector_shuffle combines when inputs are scalar_to_vecto…

606ea91

…r of differing types.

This patch fixes the combines for vector_shuffles when either or both of its
left and right hand side inputs are scalar_to_vector nodes.

Previously, when both left and right side inputs are scalar_to_vector nodes,
the current combine could not handle this situation, as the shuffle mask was
updated incorrectly. To temporarily solve this solution, this combine was
simply disabled and not performed.

Now, not only does this patch aim to resolve the previous issue of the incorrect
shuffle mask adjustments respectively, but it also updates any test cases
that are affected by this change.

Patch migrated from https://reviews.llvm.org/D130487.


          Address coding guideline comment

0bde144


          Save frequent calculations into separate variables, and pull out comm…

efd6964

…on code into a static function.

amy-kwan force-pushed the amy-kwan/vector-shuff-stov branch from 6df51c4 to efd6964 Compare

April 18, 2024 15:41

amy-kwan requested a review from diggerlin

April 18, 2024 16:09

maryammo reviewed

View reviewed changes

llvm/lib/Target/PowerPC/PPCISelLowering.cpp

+                  // entries in the range [RHSFirstElt,RHSLastElt] will be adjusted.
+                  fixupShuffleMaskForPermutedSToV(
+                      ShuffV, LHSFirstElt, LHSLastElt, RHSFirstElt, RHSLastElt, HalfVec,
+                      LHSNumValidElts, RHSNumValidElts, Subtarget);

Contributor

maryammo Apr 24, 2024

it seems LHSNumValidElts and RHSNumValidElts are both equal to HalfVec, if so why passing 2 separate parameter for them?

Contributor Author

amy-kwan Apr 25, 2024

That is correct that they are both equal to HalfVec initially, but they can change within generateSToVPermutedForVecShuffle() since one of the parameters to this function is unsigned &NumValidElts so I require them to be separate variables.

llvm/lib/Target/PowerPC/PPCISelLowering.cpp

-                                                  HalfVec, ValidLaneWidth, Subtarget);
+                  // entries in the range [RHSFirstElt,RHSLastElt] will be adjusted.
+                  fixupShuffleMaskForPermutedSToV(
+                      ShuffV, LHSFirstElt, LHSLastElt, RHSFirstElt, RHSLastElt, HalfVec,

Contributor

maryammo Apr 24, 2024

same question about LHSFirstElt?

Contributor Author

amy-kwan Apr 25, 2024

Although the LHSFirstElt and RHSFirstElt values do not change, I think it would be good to be explicit to mention them here so it's obvious to the reader of the code what values are needed when we need to fix up the shuffle mask (after generating the SCALAR_TO_VECTOR_PERMUTED node).

llvm/lib/Target/PowerPC/PPCISelLowering.cpp

-                      SToVLHS = DAG.getBitcast(LHS.getValueType(), SToVLHS);
-                    LHS = SToVLHS;
+                    LHS = generateSToVPermutedForVecShuffle(
+                        LHSScalarSize, ShuffleEltWidth, LHSNumValidElts, LHSFirstElt,

Contributor

maryammo Apr 24, 2024

It seems LHSFirstElt is always 0, if that is the case, why it is being passed as a parameter?

Contributor Author

amy-kwan Apr 25, 2024

Yeah, the first element of LHS will always be zero. Originally, I didn't have the generateSToVPermutedForVecShuffle() function, but I added it due to the comment I received in 606ea91#r1560041740 in attempts to common up some code.

The first element of the LHS actually doesn't matter when I'm generating the SCALAR_TO_VECTOR_PERMUTED within generateSToVPermutedForVecShuffle(), so I could just pass in zero, but where it matters is at the RHS.

I chose to still pass in LHSFirstElt here just to be explicit in the code that we're dealing with the first element of the LHS.

amy-kwan requested a review from maryammo

April 25, 2024 00:06

Contributor Author

amy-kwan commented Apr 30, 2024

Ping. :-)

diggerlin reviewed

View reviewed changes

llvm/test/CodeGen/PowerPC/p8-scalar_vector_conversions.ll

@@ @@ -2505,11 +2505,9 @@ define <2 x i64> @buildi2(i64 %arg, i32 %arg1) { @@
               ;
               ; CHECK-LE-LABEL: buildi2:
               ; CHECK-LE:       # %bb.0: # %entry
-              ; CHECK-LE-NEXT:    mtfprd f0, r4
+              ; CHECK-LE-NEXT:    mtfprwz f0, r4

Contributor

diggerlin May 6, 2024

mtfprwz f0, r4 only move word (32bit) from r4 to f0,

but
mtfprd f0 , r4 move word (64bit) from r4 to f0,

is it correct ?

llvm/test/CodeGen/PowerPC/v16i8_scalar_to_vector_shuffle.ll

-              ; CHECK-BE-P8-NEXT:    mtvsrd v3, r3
-              ; CHECK-BE-P8-NEXT:    vmrghh v2, v2, v3
+              ; CHECK-BE-P8-NEXT:    mtvsrwz v3, r3
+              ; CHECK-BE-P8-NEXT:    addis r3, r2, .LCPI3_0@toc@ha

Contributor

diggerlin May 6, 2024 •

edited

I am curious . there is no read data from TC entry in the left side , but has the read data from TC entry in the right side, the PR look not related to TC entry ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

diggerlin diggerlin left review comments

kamaub Awaiting requested review from kamaub

lei137 Awaiting requested review from lei137

stefanp-ibm Awaiting requested review from stefanp-ibm

nemanjai Awaiting requested review from nemanjai

redstar Awaiting requested review from redstar

maryammo Awaiting requested review from maryammo