-
Notifications
You must be signed in to change notification settings - Fork 10.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[PowerPC] Fix vector_shuffle combines when inputs are scalar_to_vector of differing types. #80784
base: main
Are you sure you want to change the base?
Conversation
Ping. |
2 similar comments
Ping. |
Ping. |
Ping on this review once again :) |
static void fixupShuffleMaskForPermutedSToV( | ||
SmallVectorImpl<int> &ShuffV, int LHSFirstElt, int LHSLastElt, | ||
int RHSFirstElt, int RHSLastElt, int HalfVec, unsigned LHSNumValidElts, | ||
unsigned RHSNumValidElts, const PPCSubtarget &Subtarget) { | ||
for (int i = 0, e = ShuffV.size(); i < e; i++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, a nit: according to the coding standard, variable names should be uppercase. In addition, preincrement is preferred:
for (int i = 0, e = ShuffV.size(); i < e; i++) { | |
for (int I = 0, E = ShuffV.size(); I < E; ++I) { |
static bool isShuffleMaskInRange(const SmallVectorImpl<int> &ShuffV, | ||
int HalfVec, int LHSLastElementDefined, | ||
int RHSLastElementDefined) { | ||
for (int i : seq<int>(0, ShuffV.size())) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above:
for (int i : seq<int>(0, ShuffV.size())) { | |
for (int I : seq<int>(0, ShuffV.size())) { |
int Idx = ShuffV[I]; | ||
if (Idx >= LHSFirstElt && Idx <= LHSLastElt) | ||
ShuffV[I] += | ||
Subtarget.isLittleEndian() ? HalfVec : HalfVec - LHSNumValidElts; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pull out line the loop , only need to calculate the value once
Subtarget.isLittleEndian() ? HalfVec : HalfVec - LHSNumValidElts;
and assign to variable .
Subtarget.isLittleEndian() ? HalfVec : HalfVec - LHSNumValidElts; | ||
if (Idx >= RHSFirstElt && Idx <= RHSLastElt) | ||
ShuffV[I] += | ||
Subtarget.isLittleEndian() ? HalfVec : HalfVec - RHSNumValidElts; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
// (shuff (s_to_v i32), (bitcast (s_to_v i64), v4i32), ...) | ||
// The last element that comes from the LHS is actually 0, not 3 | ||
// because elements 1 and higher of a scalar_to_vector are undefined. | ||
LHSLastElt = LHSScalarSize / (ShuffleEltWidth + 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am curious why ShuffleEltWidth + 1
in LHSLastElt = LHSScalarSize / (ShuffleEltWidth + 1);
, can you explain it ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just for adjustment for the computation to ensure we get the correct last element.
For example,
t36: i32,ch = load<(load (s32) from %ir.a)> t0, t2, undef:i64
t37: v4i32 = scalar_to_vector t36
t38: v16i8 = bitcast t37
t4: i64,ch = CopyFromReg t0, Register:i64 %1
t39: i64,ch = load<(load (s64) from %ir.b)> t0, t4, undef:i64
t40: v2i64 = scalar_to_vector t39
t41: v16i8 = bitcast t40
t17: v16i8 = vector_shuffle<0,1,16,17,18,19,20,21,22,23,u,u,u,u,u,u> t38, t41
For this, I believe we should have:
LHSFirstElt = 0
LHSLastElt = 3
RHSFirstElt = 16
RHSLastElt = 23
However, if we did LHSLastElt = LHSScalarSize / (ShuffleEltWidth);
, we would have 32 / 8 = 4
, which isn't correct. If we did 32 / 8 + 1 = 32 / 9
, we would get 3, which is what we expect.
// because elements 1 and higher of a scalar_to_vector are undefined. | ||
// It is also not 4 because the original scalar_to_vector is wider and | ||
// actually contains two i32 elements. | ||
RHSLastElt = RHSScalarSize / (ShuffleEltWidth + 1) + RHSFirstElt; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
most of code
if (SToVLHS) {
...
}
as same with
if (SToVRHS) {
...
}
can you add a helper function or lambda function to simply the code ?
…r of differing types. This patch fixes the combines for vector_shuffles when either or both of its left and right hand side inputs are scalar_to_vector nodes. Previously, when both left and right side inputs are scalar_to_vector nodes, the current combine could not handle this situation, as the shuffle mask was updated incorrectly. To temporarily solve this solution, this combine was simply disabled and not performed. Now, not only does this patch aim to resolve the previous issue of the incorrect shuffle mask adjustments respectively, but it also updates any test cases that are affected by this change. Patch migrated from https://reviews.llvm.org/D130487.
…on code into a static function.
6df51c4
to
efd6964
Compare
// entries in the range [RHSFirstElt,RHSLastElt] will be adjusted. | ||
fixupShuffleMaskForPermutedSToV( | ||
ShuffV, LHSFirstElt, LHSLastElt, RHSFirstElt, RHSLastElt, HalfVec, | ||
LHSNumValidElts, RHSNumValidElts, Subtarget); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it seems LHSNumValidElts and RHSNumValidElts are both equal to HalfVec, if so why passing 2 separate parameter for them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is correct that they are both equal to HalfVec
initially, but they can change within generateSToVPermutedForVecShuffle()
since one of the parameters to this function is unsigned &NumValidElts
so I require them to be separate variables.
HalfVec, ValidLaneWidth, Subtarget); | ||
// entries in the range [RHSFirstElt,RHSLastElt] will be adjusted. | ||
fixupShuffleMaskForPermutedSToV( | ||
ShuffV, LHSFirstElt, LHSLastElt, RHSFirstElt, RHSLastElt, HalfVec, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same question about LHSFirstElt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although the LHSFirstElt and RHSFirstElt values do not change, I think it would be good to be explicit to mention them here so it's obvious to the reader of the code what values are needed when we need to fix up the shuffle mask (after generating the SCALAR_TO_VECTOR_PERMUTED
node).
SToVLHS = DAG.getBitcast(LHS.getValueType(), SToVLHS); | ||
LHS = SToVLHS; | ||
LHS = generateSToVPermutedForVecShuffle( | ||
LHSScalarSize, ShuffleEltWidth, LHSNumValidElts, LHSFirstElt, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems LHSFirstElt is always 0, if that is the case, why it is being passed as a parameter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, the first element of LHS will always be zero. Originally, I didn't have the generateSToVPermutedForVecShuffle()
function, but I added it due to the comment I received in 606ea91#r1560041740 in attempts to common up some code.
The first element of the LHS actually doesn't matter when I'm generating the SCALAR_TO_VECTOR_PERMUTED
within generateSToVPermutedForVecShuffle()
, so I could just pass in zero, but where it matters is at the RHS.
I chose to still pass in LHSFirstElt
here just to be explicit in the code that we're dealing with the first element of the LHS.
Ping. :-) |
@@ -2505,11 +2505,9 @@ define <2 x i64> @buildi2(i64 %arg, i32 %arg1) { | |||
; | |||
; CHECK-LE-LABEL: buildi2: | |||
; CHECK-LE: # %bb.0: # %entry | |||
; CHECK-LE-NEXT: mtfprd f0, r4 | |||
; CHECK-LE-NEXT: mtfprwz f0, r4 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mtfprwz f0, r4
only move word (32bit) from r4 to f0,
but
mtfprd f0 , r4
move word (64bit) from r4 to f0,
is it correct ?
; CHECK-BE-P8-NEXT: mtvsrd v3, r3 | ||
; CHECK-BE-P8-NEXT: vmrghh v2, v2, v3 | ||
; CHECK-BE-P8-NEXT: mtvsrwz v3, r3 | ||
; CHECK-BE-P8-NEXT: addis r3, r2, .LCPI3_0@toc@ha |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am curious . there is no read data from TC entry in the left side , but has the read data from TC entry in the right side, the PR look not related to TC entry ?
This patch fixes the combines for vector_shuffles when either or both of its left
and right hand side inputs are scalar_to_vector nodes.
Previously, when both left and right side inputs are scalar_to_vector nodes, the
current combine could not handle this situation, as the shuffle mask was updated
incorrectly. To temporarily solve this solution, this combine was simply disabled
and not performed.
Now, not only does this patch aim to resolve the previous issue of the incorrect
shuffle mask adjustments respectively, but it also updates any test cases that are
affected by this change.
Patch migrated from https://reviews.llvm.org/D130487.