-
Notifications
You must be signed in to change notification settings - Fork 10.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SystemZ] Fix Operand Retrieval for Vector Reduction Intrinsic in shouldExpandReduction
#88874
Conversation
@uweigand @JonPsson1 FYI |
@llvm/pr-subscribers-backend-systemz Author: Dominik Steenken (dominik-steenken) ChangesIn the existing version, SystemZTTIImpl::shouldExpandReduction will create a Full diff: https://github.com/llvm/llvm-project/pull/88874.diff 1 Files Affected:
diff --git a/llvm/lib/Target/SystemZ/SystemZTargetTransformInfo.cpp b/llvm/lib/Target/SystemZ/SystemZTargetTransformInfo.cpp
index 4c9e78c05dbcac..5da42d7cccd3c1 100644
--- a/llvm/lib/Target/SystemZ/SystemZTargetTransformInfo.cpp
+++ b/llvm/lib/Target/SystemZ/SystemZTargetTransformInfo.cpp
@@ -18,6 +18,7 @@
#include "llvm/CodeGen/BasicTTIImpl.h"
#include "llvm/CodeGen/CostTable.h"
#include "llvm/CodeGen/TargetLowering.h"
+#include "llvm/IR/DerivedTypes.h"
#include "llvm/IR/IntrinsicInst.h"
#include "llvm/IR/Intrinsics.h"
#include "llvm/Support/Debug.h"
@@ -1323,25 +1324,43 @@ SystemZTTIImpl::getIntrinsicInstrCost(const IntrinsicCostAttributes &ICA,
}
bool SystemZTTIImpl::shouldExpandReduction(const IntrinsicInst *II) const {
- // Always expand on Subtargets without vector instructions
+ // Always expand on Subtargets without vector instructions.
if (!ST->hasVector())
return true;
- // Always expand for operands that do not fill one vector reg
- auto *Type = cast<FixedVectorType>(II->getOperand(0)->getType());
- unsigned NumElts = Type->getNumElements();
- unsigned ScalarSize = Type->getScalarSizeInBits();
+ // Find the type of the vector operand of the intrinsic
+ // This assumes that each vector reduction intrinsic only
+ // has one vector operand.
+ FixedVectorType *VType = 0x0;
+ for (unsigned I = 0; I < II->getNumOperands(); ++I) {
+ auto *T = II->getOperand(I)->getType();
+ if (T->isVectorTy()) {
+ VType = cast<FixedVectorType>(T);
+ break;
+ }
+ }
+
+ // If we did not find a vector operand, do not continue.
+ if (VType == 0x0)
+ return true;
+
+ // If the vector operand is not a full vector, the reduction
+ // should be expanded.
+ unsigned NumElts = VType->getNumElements();
+ unsigned ScalarSize = VType->getScalarSizeInBits();
unsigned MaxElts = SystemZ::VectorBits / ScalarSize;
if (NumElts < MaxElts)
return true;
- // Otherwise
+ // Handling of full vector operands depends on the
+ // individual intrinsic.
switch (II->getIntrinsicID()) {
- // Do not expand vector.reduce.add
- case Intrinsic::vector_reduce_add:
- // Except for i64, since the performance benefit is dubious there
- return ScalarSize >= 64;
default:
return true;
+ // Do not expand vector.reduce.add...
+ case Intrinsic::vector_reduce_add:
+ // ...unless the scalar size is i64 or larger, since the
+ // performance benefit is dubious there
+ return ScalarSize >= 64;
}
}
|
// Find the type of the vector operand of the intrinsic | ||
// This assumes that each vector reduction intrinsic only | ||
// has one vector operand. | ||
FixedVectorType *VType = 0x0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nullptr
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
break; | ||
} | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't seem quite foolproof to me to make the assumption that all vector operands have the same type. Or is it?
If so, one option could be to keep looping and if VType is non-null (following operands) assert that VType == T.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was going off the list of intrinsics here. According to that, all of the reduction intrinsics either have a single vector as an operand, or a scalar and a vector. So any vector found should be the only vector.
However, if we want to be robust towards future intrinsic additions, we might want to handle this diffferently? However, once there are more than one vector operand, how would we generically decide which one the vector that is supposed to be reduced is?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should either explicitly handle each intrinsic by opcode (opc -> vector type), or else at least make sure that there is only one vector operand type. If something different emerges in the future, I guess then we would have to handle it by opcode.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm just thinking that, whether or not the operand vector is full is probably going to be a factor in many of the reductions. So it seems wrong to have it just in the branch for vector.reduce.add
. I do see your point though about keeping things as simple as possible. I'm going to refactor things a bit to make things simpler, but kepp some of the extensibility in case we need to handle more intrinsics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldExpandReduction() looks nice now.
I don't quite like getFirstOperandVectorType(), it's not robust against any future instrinsics we do not know about, and it doesn't have an assert that would discover any such future cases if they emerge.
We want to know the intrinsic type, and we know which intrinsic we are dealing with. So why make it unnecessarily general? No need to loop as we know what operand number we need to use. This could be done directly in shouldExpandReduction(), or if you want to have a function for it, you could pass the operand number to it.
default: | ||
return true; | ||
// Do not expand vector.reduce.add... | ||
case Intrinsic::vector_reduce_add: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is the only reduction handled, would it simplify to just check against this near the top, and return true for anything else?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would simplify it, however, the initial intent was that we would probably want to do similar things for other reduction intrinsics in the future, so i wanted it to be a bit more generic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think generally we might want to keep the code as simple as possible until we actually need something more complex. In this case it is slightly confusing to me to go through the switch and all, when there is actually only one intrinsic handled... But not sure if this is just my opinion...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This style of commenting is kind of nice but I find it perhaps a little unusual compared to https://llvm.org/docs/CodingStandards.html#id8. I guess it's up to you how you want to write it, but at the very least add the final period. I think typically the comment for a particular switch case should be in one place, probably all just below the 'case' line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i will add the .
// ...unless the scalar size is i64 or larger, | ||
// or the operand vector is not full, since the | ||
// performance benefit is dubious in those cases | ||
return (VType->getScalarSizeInBits() >= 64) || not isVectorFull(VType); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isVectorFull: So the idea is that any vectors which are less than one vector should be expanded, but e.g. 1.5 vectors should not be..? Couldn't you then just check if the total size of VType is >= SystemZ::VectorBits?
not -> !
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, that would be easier. i was thinking in terms of element counts, but we don't really need that.
✅ With the latest revision this PR passed the C/C++ code formatter. |
assert(T->isVectorTy()); | ||
return cast<FixedVectorType>(T); | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the intrinsic is defined so that it must have a vector of integer values, we do not need to verify this. I think this should be the same for other intrinsics as well. Did you try to use some other kind of type and compile it with 'opt'? I think it will verify the module and report broken IR.
Given that we don't need to have an extra redundant check here, I think this method (at least for now) is a bit superfluous - probably easier to just get the type directly in one line below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The vectors can also have float values, but that would not preclude the FixedVectorType
. The idea behind the assert was that when you use this function to get the operand type for a given intrinsics vector operand, to check that you supplied the correct operand index, so we don't run into the same error that motivated this PR in the first place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that is over-asserting as the operand index / type never changes and is trivially entered just once and has to be correct per the language spec.
The problem was that we handled some other intrinsic by accident, not that we messed up the operand index of the one we are handling, right?
Do you agree, or you think I am missing something here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think handling other intrinsics was intended (but primarily for future use), and the problem arose because some of them had the vector operand in a different place.
But this is not a hill i'm willing to die on. If you prefer to not have the assert and then move the cast out of the function and back into the case statement, i'll do that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's kind of interesting if it is needed or not, and maybe also a matter of style, perhaps. @uweigand maybe has an opinion as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do believe this assert is redundant - it is fails, the immediately following cast would also fail, which would give us the same information. And without the assert, the function becomes so trivial that it might as well be inlined back in place?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
correct, without the assert, the function can go.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See inline comments. Otherwise looks good to me.
assert(T->isVectorTy()); | ||
return cast<FixedVectorType>(T); | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do believe this assert is redundant - it is fails, the immediately following cast would also fail, which would give us the same information. And without the assert, the function becomes so trivial that it might as well be inlined back in place?
// ...unless the scalar size is i64 or larger, | ||
// or the operand vector is not full, since the | ||
// performance benefit is dubious in those cases. | ||
return (VType->getScalarSizeInBits() >= 64) || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need the parentheses here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, i've incorporated both of these comments.
In the existing version, SystemZTTIImpl::shouldExpandReduction will create a `cast` error when handling vector reduction intrinsics that do not have the vector to reduce as their first operand, such as `llvm.vector.reduce.fadd` and `llvm.vector.reduce.fmul`. This commit fixes that problem by moving the cast into the case statement that handles the specific intrinsic, where the vector operand position is well-known.
181b77b
to
da9a7b3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM now, thanks!
Hi @dominik-steenken I just noticed that the commit message no longer matches what the code now actually does ... can you fix this before I merge it? Thanks! |
I thought i did - i changed the reference to a loop from earlier to
do you think that's not accurate? |
This is fine - but this is not what GitHub's "merge and squash" was suggesting as commit message. There I saw "This commit fixes that problem by introducing a short loop to find the vector operand instead of assuming that it is the first operand." instead, not sure why. Anyway, I've now merged using the correct message. |
…ouldExpandReduction` (llvm#88874) In the existing version, SystemZTTIImpl::shouldExpandReduction will create a `cast` error when handling vector reduction intrinsics that do not have the vector to reduce as their first operand, such as `llvm.vector.reduce.fadd` and `llvm.vector.reduce.fmul`. This commit fixes that problem by moving the cast into the case statement that handles the specific intrinsic, where the vector operand position is well-known.
In the existing version, SystemZTTIImpl::shouldExpandReduction will create a
cast
error when handling vector reduction intrinsics that do not have the vector to reduce as their first operand, such asllvm.vector.reduce.fadd
andllvm.vector.reduce.fmul
. This commit fixes that problem by introducing a short loop to find the vector operand instead of assuming that it is the first operand.