-
Notifications
You must be signed in to change notification settings - Fork 10.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Vectorize] Vectorization for __builtin_prefetch #66160
base: main
Are you sure you want to change the base?
Conversation
4c9db73
to
51d6a07
Compare
I have created several patches to be able to vectorize loops containing __builtin_prefetch with LoopVectorize in AArch64 and generate SVE prefetch (prfb/prfh/prfw/prfd) instructions. This patch is one of them and is mainly for the IR part. In this patch, prefetch intrinsic cannot be vectorized because isLegalMaskedPrefetch returns false by default. Therefore, I have not added a test to this patch. Tests are attached to other patches. If you want to refer to them, please refer to the following URL. |
51d6a07
to
93cfc51
Compare
Gentle ping ... |
ping. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some comments, have not looked at all the code.
case Intrinsic::masked_gather_prefetch: { | ||
const Value *Mask = Args[4]; | ||
bool VarMask = !isa<Constant>(Mask); | ||
Align Alignment = cast<ConstantInt>(Args[1])->getAlignValue(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're not checking if Args[1]
is a ConstantInt
, Is that guaranteed elsewhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is guaranteed. Masked_gather_prefetch intrinsic is created only by IRBuilder::CreateMaskedGatherPrefetch
function, and Arg[1]
is create as ConstantInt
in the function.
This is a pattern similar to masked_gather intrinsic, and Arg[1]
is not checked in the masked_gather intrinsic case.
llvm/lib/IR/IRBuilder.cpp
Outdated
Value *Locality, | ||
const Twine &Name) { | ||
auto *PtrTy = cast<PointerType>(Ptr->getType()); | ||
assert(Mask && "Mask should not be all-ones (null)"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't get this message. I assumed all ones
would be all lanes, and a null
value would be all zeroes
which would be no lanes (which is still a valid mask, just not useful for pre-fetching).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your comment.
I had just done the same with masked load/masked store.
I think that if mask
is null
, then mask can be considered unnecessary. So I think there is no problem with all ones
.
But, I think it would be confusing with that message as it was, so I fixed it.
assert message.
The comment is outdated because I force pushed it. Please see link above.
auto *PtrsTy = cast<VectorType>(Ptrs->getType()); | ||
ElementCount NumElts = PtrsTy->getElementCount(); | ||
|
||
if (!Mask) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now I see where that comes from. I think this is a misleading tactic. I'd force the mask to have at least one value in it (f that's what you want) here and not and the next call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
case Instruction::Call: { | ||
if (!isa<PrefetchInst>(I)) | ||
return !VFDatabase::hasMaskedVariant(*(cast<CallInst>(I)), VF); | ||
auto *Ptr = getPrefetchPointerOperand(I); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't you implement hasMaskedVariant
for prefetch?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the comment.
I think I can't implement it. This is because HasMaskedVariant
checks if the library you are using has a masked and vectorized implementation. I don't think prefetch will ever be implemented in the library. This is because prefetch intrinsic is converted to a single instruction. Therefore, we did not consider using HasMaskedVariant.
93cfc51
to
3a8518a
Compare
✅ With the latest revision this PR passed the C/C++ code formatter. |
3a8518a
to
514f7be
Compare
This patch is conflicting with the main branch and will be fixed and force-pushed after rebase. |
64541d2
to
6635fa0
Compare
The rebase of this patch is complete. |
3b8fb11
to
15c4966
Compare
Allow vectorization of loops containing __builtin_prefetch. Add masked_prefetch intrinsic and masked_gather_prefetch intrinsic for this purpose. Also, add a process to vectorize prefetch intrinsic in LoopVectorize.
A couple of high level comments:
I think it might make sense to have an initial patch that introduces the intrinsics+langref and adds the codegen support for them - both for genetic targets and for SVE. Once that is in it would make adding tests for vectorization easier. |
@davemgreen Thank you for your comment.
My main goal is to vectorize the loop containing __builtin_prefetch. Certainly, there are several ways to vectorize a loop containing builtin_prefetch. I implemented vectorization using the SVE instruction, but I think it would be better to implement other vectorization support and options to choose between those methods.
Implementation
Does this mean it would be better to aim to merge 1.1, 2.1 and 3.2 in one patch first? |
Could we add 1.1 + 2.1 + tests for some architecture (maybe aarch64 without sve) in one patch, with 3.1 added in a second. 1.2+3.2+tests can then make up the third. |
Thank you for the suggestion. |
Allow vectorization of loops containing __builtin_prefetch. Add masked_prefetch intrinsic and masked_gather_prefetch intrinsic for this purpose. Also, add a process to vectorize prefetch intrinsic in LoopVectorize.