Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SystemZ] Fix Operand Retrieval for Vector Reduction Intrinsic in shouldExpandReduction #88874

Merged
merged 1 commit into from
Apr 19, 2024

Conversation

dominik-steenken
Copy link
Contributor

In the existing version, SystemZTTIImpl::shouldExpandReduction will create a cast error when handling vector reduction intrinsics that do not have the vector to reduce as their first operand, such as llvm.vector.reduce.fadd and llvm.vector.reduce.fmul. This commit fixes that problem by introducing a short loop to find the vector operand instead of assuming that it is the first operand.

@dominik-steenken
Copy link
Contributor Author

@uweigand @JonPsson1 FYI

@llvmbot
Copy link
Collaborator

llvmbot commented Apr 16, 2024

@llvm/pr-subscribers-backend-systemz

Author: Dominik Steenken (dominik-steenken)

Changes

In the existing version, SystemZTTIImpl::shouldExpandReduction will create a cast error when handling vector reduction intrinsics that do not have the vector to reduce as their first operand, such as llvm.vector.reduce.fadd and llvm.vector.reduce.fmul. This commit fixes that problem by introducing a short loop to find the vector operand instead of assuming that it is the first operand.


Full diff: https://github.com/llvm/llvm-project/pull/88874.diff

1 Files Affected:

  • (modified) llvm/lib/Target/SystemZ/SystemZTargetTransformInfo.cpp (+29-10)
diff --git a/llvm/lib/Target/SystemZ/SystemZTargetTransformInfo.cpp b/llvm/lib/Target/SystemZ/SystemZTargetTransformInfo.cpp
index 4c9e78c05dbcac..5da42d7cccd3c1 100644
--- a/llvm/lib/Target/SystemZ/SystemZTargetTransformInfo.cpp
+++ b/llvm/lib/Target/SystemZ/SystemZTargetTransformInfo.cpp
@@ -18,6 +18,7 @@
 #include "llvm/CodeGen/BasicTTIImpl.h"
 #include "llvm/CodeGen/CostTable.h"
 #include "llvm/CodeGen/TargetLowering.h"
+#include "llvm/IR/DerivedTypes.h"
 #include "llvm/IR/IntrinsicInst.h"
 #include "llvm/IR/Intrinsics.h"
 #include "llvm/Support/Debug.h"
@@ -1323,25 +1324,43 @@ SystemZTTIImpl::getIntrinsicInstrCost(const IntrinsicCostAttributes &ICA,
 }
 
 bool SystemZTTIImpl::shouldExpandReduction(const IntrinsicInst *II) const {
-  // Always expand on Subtargets without vector instructions
+  // Always expand on Subtargets without vector instructions.
   if (!ST->hasVector())
     return true;
 
-  // Always expand for operands that do not fill one vector reg
-  auto *Type = cast<FixedVectorType>(II->getOperand(0)->getType());
-  unsigned NumElts = Type->getNumElements();
-  unsigned ScalarSize = Type->getScalarSizeInBits();
+  // Find the type of the vector operand of the intrinsic
+  // This assumes that each vector reduction intrinsic only
+  // has one vector operand.
+  FixedVectorType *VType = 0x0;
+  for (unsigned I = 0; I < II->getNumOperands(); ++I) {
+    auto *T = II->getOperand(I)->getType();
+    if (T->isVectorTy()) {
+      VType = cast<FixedVectorType>(T);
+      break;
+    }
+  }
+
+  // If we did not find a vector operand, do not continue.
+  if (VType == 0x0)
+    return true;
+
+  // If the vector operand is not a full vector, the reduction
+  // should be expanded.
+  unsigned NumElts = VType->getNumElements();
+  unsigned ScalarSize = VType->getScalarSizeInBits();
   unsigned MaxElts = SystemZ::VectorBits / ScalarSize;
   if (NumElts < MaxElts)
     return true;
 
-  // Otherwise
+  // Handling of full vector operands depends on the
+  // individual intrinsic.
   switch (II->getIntrinsicID()) {
-  // Do not expand vector.reduce.add
-  case Intrinsic::vector_reduce_add:
-    // Except for i64, since the performance benefit is dubious there
-    return ScalarSize >= 64;
   default:
     return true;
+  // Do not expand vector.reduce.add...
+  case Intrinsic::vector_reduce_add:
+    // ...unless the scalar size is i64 or larger, since the
+    // performance benefit is dubious there
+    return ScalarSize >= 64;
   }
 }

// Find the type of the vector operand of the intrinsic
// This assumes that each vector reduction intrinsic only
// has one vector operand.
FixedVectorType *VType = 0x0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nullptr

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

break;
}
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't seem quite foolproof to me to make the assumption that all vector operands have the same type. Or is it?

If so, one option could be to keep looping and if VType is non-null (following operands) assert that VType == T.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was going off the list of intrinsics here. According to that, all of the reduction intrinsics either have a single vector as an operand, or a scalar and a vector. So any vector found should be the only vector.
However, if we want to be robust towards future intrinsic additions, we might want to handle this diffferently? However, once there are more than one vector operand, how would we generically decide which one the vector that is supposed to be reduced is?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should either explicitly handle each intrinsic by opcode (opc -> vector type), or else at least make sure that there is only one vector operand type. If something different emerges in the future, I guess then we would have to handle it by opcode.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm just thinking that, whether or not the operand vector is full is probably going to be a factor in many of the reductions. So it seems wrong to have it just in the branch for vector.reduce.add. I do see your point though about keeping things as simple as possible. I'm going to refactor things a bit to make things simpler, but kepp some of the extensibility in case we need to handle more intrinsics.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldExpandReduction() looks nice now.

I don't quite like getFirstOperandVectorType(), it's not robust against any future instrinsics we do not know about, and it doesn't have an assert that would discover any such future cases if they emerge.

We want to know the intrinsic type, and we know which intrinsic we are dealing with. So why make it unnecessarily general? No need to loop as we know what operand number we need to use. This could be done directly in shouldExpandReduction(), or if you want to have a function for it, you could pass the operand number to it.

default:
return true;
// Do not expand vector.reduce.add...
case Intrinsic::vector_reduce_add:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is the only reduction handled, would it simplify to just check against this near the top, and return true for anything else?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would simplify it, however, the initial intent was that we would probably want to do similar things for other reduction intrinsics in the future, so i wanted it to be a bit more generic.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think generally we might want to keep the code as simple as possible until we actually need something more complex. In this case it is slightly confusing to me to go through the switch and all, when there is actually only one intrinsic handled... But not sure if this is just my opinion...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This style of commenting is kind of nice but I find it perhaps a little unusual compared to https://llvm.org/docs/CodingStandards.html#id8. I guess it's up to you how you want to write it, but at the very least add the final period. I think typically the comment for a particular switch case should be in one place, probably all just below the 'case' line.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i will add the .

// ...unless the scalar size is i64 or larger,
// or the operand vector is not full, since the
// performance benefit is dubious in those cases
return (VType->getScalarSizeInBits() >= 64) || not isVectorFull(VType);
}
Copy link
Contributor

@JonPsson1 JonPsson1 Apr 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isVectorFull: So the idea is that any vectors which are less than one vector should be expanded, but e.g. 1.5 vectors should not be..? Couldn't you then just check if the total size of VType is >= SystemZ::VectorBits?

not -> !

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, that would be easier. i was thinking in terms of element counts, but we don't really need that.

Copy link

github-actions bot commented Apr 17, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

assert(T->isVectorTy());
return cast<FixedVectorType>(T);
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the intrinsic is defined so that it must have a vector of integer values, we do not need to verify this. I think this should be the same for other intrinsics as well. Did you try to use some other kind of type and compile it with 'opt'? I think it will verify the module and report broken IR.

Given that we don't need to have an extra redundant check here, I think this method (at least for now) is a bit superfluous - probably easier to just get the type directly in one line below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The vectors can also have float values, but that would not preclude the FixedVectorType. The idea behind the assert was that when you use this function to get the operand type for a given intrinsics vector operand, to check that you supplied the correct operand index, so we don't run into the same error that motivated this PR in the first place.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that is over-asserting as the operand index / type never changes and is trivially entered just once and has to be correct per the language spec.

The problem was that we handled some other intrinsic by accident, not that we messed up the operand index of the one we are handling, right?

Do you agree, or you think I am missing something here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think handling other intrinsics was intended (but primarily for future use), and the problem arose because some of them had the vector operand in a different place.

But this is not a hill i'm willing to die on. If you prefer to not have the assert and then move the cast out of the function and back into the case statement, i'll do that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's kind of interesting if it is needed or not, and maybe also a matter of style, perhaps. @uweigand maybe has an opinion as well?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do believe this assert is redundant - it is fails, the immediately following cast would also fail, which would give us the same information. And without the assert, the function becomes so trivial that it might as well be inlined back in place?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

correct, without the assert, the function can go.

Copy link
Member

@uweigand uweigand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See inline comments. Otherwise looks good to me.

assert(T->isVectorTy());
return cast<FixedVectorType>(T);
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do believe this assert is redundant - it is fails, the immediately following cast would also fail, which would give us the same information. And without the assert, the function becomes so trivial that it might as well be inlined back in place?

// ...unless the scalar size is i64 or larger,
// or the operand vector is not full, since the
// performance benefit is dubious in those cases.
return (VType->getScalarSizeInBits() >= 64) ||
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need the parentheses here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, i've incorporated both of these comments.

In the existing version, SystemZTTIImpl::shouldExpandReduction will
create a `cast` error when handling vector reduction intrinsics that
do not have the vector to reduce as their first operand, such as
`llvm.vector.reduce.fadd` and `llvm.vector.reduce.fmul`.
This commit fixes that problem by moving the cast into the case
statement that handles the specific intrinsic, where the vector
operand position is well-known.
Copy link
Member

@uweigand uweigand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM now, thanks!

@uweigand
Copy link
Member

Hi @dominik-steenken I just noticed that the commit message no longer matches what the code now actually does ... can you fix this before I merge it? Thanks!

@dominik-steenken
Copy link
Contributor Author

I thought i did - i changed the reference to a loop from earlier to

This commit fixes that problem by moving the cast into the case
statement that handles the specific intrinsic, where the vector
operand position is well-known.

do you think that's not accurate?

@uweigand uweigand merged commit 5af9701 into llvm:main Apr 19, 2024
3 of 4 checks passed
@uweigand
Copy link
Member

I thought i did - i changed the reference to a loop from earlier to

This commit fixes that problem by moving the cast into the case
statement that handles the specific intrinsic, where the vector
operand position is well-known.

do you think that's not accurate?

This is fine - but this is not what GitHub's "merge and squash" was suggesting as commit message. There I saw "This commit fixes that problem by introducing a short loop to find the vector operand instead of assuming that it is the first operand." instead, not sure why. Anyway, I've now merged using the correct message.

aniplcc pushed a commit to aniplcc/llvm-project that referenced this pull request Apr 21, 2024
…ouldExpandReduction` (llvm#88874)

In the existing version, SystemZTTIImpl::shouldExpandReduction will
create a `cast` error when handling vector reduction intrinsics that
do not have the vector to reduce as their first operand, such as
`llvm.vector.reduce.fadd` and `llvm.vector.reduce.fmul`.
This commit fixes that problem by moving the cast into the case
statement that handles the specific intrinsic, where the vector
operand position is well-known.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants