[Vectorize] Vectorization for __builtin_prefetch #66160

m-saito-fj · 2023-09-13T00:10:10Z

Allow vectorization of loops containing __builtin_prefetch. Add masked_prefetch intrinsic and masked_gather_prefetch intrinsic for this purpose. Also, add a process to vectorize prefetch intrinsic in LoopVectorize.

m-saito-fj · 2023-09-13T04:55:05Z

I have created several patches to be able to vectorize loops containing __builtin_prefetch with LoopVectorize in AArch64 and generate SVE prefetch (prfb/prfh/prfw/prfd) instructions.
All Patches

This patch is one of them and is mainly for the IR part. In this patch, prefetch intrinsic cannot be vectorized because isLegalMaskedPrefetch returns false by default. Therefore, I have not added a test to this patch.

Tests are attached to other patches. If you want to refer to them, please refer to the following URL.
Test for masked prefetch
Test for masked gather prefetch

llvm/include/llvm/IR/Intrinsics.td

m-saito-fj · 2023-10-15T23:05:43Z

Gentle ping ...

m-saito-fj · 2023-11-29T23:12:05Z

ping.

rengolin

Some comments, have not looked at all the code.

llvm/docs/LangRef.rst

rengolin · 2023-11-30T11:44:27Z

llvm/include/llvm/CodeGen/BasicTTIImpl.h

+    case Intrinsic::masked_gather_prefetch: {
+      const Value *Mask = Args[4];
+      bool VarMask = !isa<Constant>(Mask);
+      Align Alignment = cast<ConstantInt>(Args[1])->getAlignValue();


You're not checking if Args[1] is a ConstantInt, Is that guaranteed elsewhere?

I think it is guaranteed. Masked_gather_prefetch intrinsic is created only by IRBuilder::CreateMaskedGatherPrefetch function, and Arg[1] is create as ConstantInt in the function.
This is a pattern similar to masked_gather intrinsic, and Arg[1] is not checked in the masked_gather intrinsic case.

rengolin · 2023-11-30T11:50:54Z

llvm/lib/IR/IRBuilder.cpp

+                                              Value *Locality,
+                                              const Twine &Name) {
+  auto *PtrTy = cast<PointerType>(Ptr->getType());
+  assert(Mask && "Mask should not be all-ones (null)");


I don't get this message. I assumed all ones would be all lanes, and a null value would be all zeroes which would be no lanes (which is still a valid mask, just not useful for pre-fetching).

Thanks for your comment.

I had just done the same with masked load/masked store.
I think that if mask is null, then mask can be considered unnecessary. So I think there is no problem with all ones.

But, I think it would be confusing with that message as it was, so I fixed it.

assert message.
The comment is outdated because I force pushed it. Please see link above.

rengolin · 2023-11-30T11:52:28Z

llvm/lib/IR/IRBuilder.cpp

+  auto *PtrsTy = cast<VectorType>(Ptrs->getType());
+  ElementCount NumElts = PtrsTy->getElementCount();
+
+  if (!Mask)


Now I see where that comes from. I think this is a misleading tactic. I'd force the mask to have at least one value in it (f that's what you want) here and not and the next call.

Thanks for the comment.

I do the same with gahter/scatter.

I think that if mask is null, then mask can be considered unnecessary. So I think there is no problem with all ones.
I think it is fine as it is.

rengolin · 2023-11-30T11:56:15Z

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

+  case Instruction::Call: {
+    if (!isa<PrefetchInst>(I))
+      return !VFDatabase::hasMaskedVariant(*(cast<CallInst>(I)), VF);
+    auto *Ptr = getPrefetchPointerOperand(I);


Can't you implement hasMaskedVariant for prefetch?

Thanks for the comment.

I think I can't implement it. This is because HasMaskedVariant checks if the library you are using has a masked and vectorized implementation. I don't think prefetch will ever be implemented in the library. This is because prefetch intrinsic is converted to a single instruction. Therefore, we did not consider using HasMaskedVariant.

github-actions · 2023-11-30T13:36:50Z

✅ With the latest revision this PR passed the C/C++ code formatter.

m-saito-fj · 2023-12-18T09:15:51Z

This patch is conflicting with the main branch and will be fixed and force-pushed after rebase.

m-saito-fj · 2024-01-04T06:44:43Z

The rebase of this patch is complete.

Allow vectorization of loops containing __builtin_prefetch. Add masked_prefetch intrinsic and masked_gather_prefetch intrinsic for this purpose. Also, add a process to vectorize prefetch intrinsic in LoopVectorize.

davemgreen · 2024-01-21T18:15:52Z

A couple of high level comments:

Is the main idea to make sure that we do vectorize the loop, or are the prefetches important for performance?
Under the assumption that prefetching works on cache lines we could consider scalarizing the prefetches instead, so just normal prefetches are emitted.
Or alternatively under the assumption that the users often put the prefetches in the wrong place, we could consider just dropping them from the vectorized loop.
Having said that, SVE does these instructions so it would make sense to add them.

I think it might make sense to have an initial patch that introduces the intrinsics+langref and adds the codegen support for them - both for genetic targets and for SVE. Once that is in it would make adding tests for vectorization easier.

m-saito-fj · 2024-01-22T10:14:55Z

@davemgreen Thank you for your comment.

Is the main idea to make sure that we do vectorize the loop, or are the prefetches important for performance?

My main goal is to vectorize the loop containing __builtin_prefetch. Certainly, there are several ways to vectorize a loop containing builtin_prefetch. I implemented vectorization using the SVE instruction, but I think it would be better to implement other vectorization support and options to choose between those methods.

I think it might make sense to have an initial patch that introduces the intrinsics+langref and adds the codegen support for them - both for genetic targets and for SVE. Once that is in it would make adding tests for vectorization easier.

Implementation
The above implementation consists of three commits. Each of the commits and their additional features consist of the following

[Vectorize] Vectorization for __builitin_prefetch
1.1. Add new vector prefetch intrinsic + langref
1.2. Addition of prefetch intrinsic vectorization process to LoopVectorize
[CodeGen] CodeGen for masked_prefetch and masked_gather_prefetch
2.1. Codegen support for vector prefetch intrinsic
[AArch64][Vectorize] Vectorization for __builtin_prefetch for AArch64
3.1. Allow vectorization by LoopVectorize in AArch64
3.2. Addition of Lowering process in CodeGen for AArch64

Does this mean it would be better to aim to merge 1.1, 2.1 and 3.2 in one patch first?

davemgreen · 2024-01-22T23:39:12Z

Could we add 1.1 + 2.1 + tests for some architecture (maybe aarch64 without sve) in one patch, with 3.1 added in a second. 1.2+3.2+tests can then make up the third.

m-saito-fj · 2024-01-24T02:58:27Z

Could we add 1.1 + 2.1 + tests for some architecture (maybe aarch64 without sve) in one patch, with 3.1 added in a second. 1.2+3.2+tests can then make up the third.

Thank you for the suggestion.
I will proceed with what you suggested. I will close this request after I make another pull request.

m-saito-fj requested review from a team as code owners September 13, 2023 00:10

llvmbot added vectorization llvm:ir vectorizers llvm:analysis llvm:transforms labels Sep 13, 2023

m-saito-fj force-pushed the vectorization-builtin-prefetch branch 2 times, most recently from 4c9db73 to 51d6a07 Compare September 13, 2023 04:17

fhahn reviewed Sep 19, 2023

View reviewed changes

llvm/include/llvm/IR/Intrinsics.td Show resolved Hide resolved

Endilll removed the vectorizers label Sep 29, 2023

m-saito-fj force-pushed the vectorization-builtin-prefetch branch from 51d6a07 to 93cfc51 Compare October 4, 2023 12:54

rengolin reviewed Nov 30, 2023

View reviewed changes

m-saito-fj force-pushed the vectorization-builtin-prefetch branch from 93cfc51 to 3a8518a Compare November 30, 2023 13:28

m-saito-fj force-pushed the vectorization-builtin-prefetch branch from 3a8518a to 514f7be Compare December 14, 2023 01:06

m-saito-fj force-pushed the vectorization-builtin-prefetch branch 4 times, most recently from 64541d2 to 6635fa0 Compare January 4, 2024 05:22

m-saito-fj force-pushed the vectorization-builtin-prefetch branch 2 times, most recently from 3b8fb11 to 15c4966 Compare January 12, 2024 13:03

[Vectorize] Vectorization for __builtin_prefetch

15c4966

Allow vectorization of loops containing __builtin_prefetch. Add masked_prefetch intrinsic and masked_gather_prefetch intrinsic for this purpose. Also, add a process to vectorize prefetch intrinsic in LoopVectorize.

m-saito-fj requested a review from rengolin January 15, 2024 04:01

m-saito-fj requested a review from fhahn January 15, 2024 04:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Vectorize] Vectorization for __builtin_prefetch #66160

[Vectorize] Vectorization for __builtin_prefetch #66160

m-saito-fj commented Sep 13, 2023

m-saito-fj commented Sep 13, 2023

m-saito-fj commented Oct 15, 2023

m-saito-fj commented Nov 29, 2023

rengolin left a comment

rengolin Nov 30, 2023

m-saito-fj Nov 30, 2023

rengolin Nov 30, 2023

m-saito-fj Dec 14, 2023 •

edited

rengolin Nov 30, 2023

m-saito-fj Dec 14, 2023

rengolin Nov 30, 2023

m-saito-fj Dec 14, 2023

github-actions bot commented Nov 30, 2023 •

edited

m-saito-fj commented Dec 18, 2023

m-saito-fj commented Jan 4, 2024

davemgreen commented Jan 21, 2024

m-saito-fj commented Jan 22, 2024

davemgreen commented Jan 22, 2024

m-saito-fj commented Jan 24, 2024

[Vectorize] Vectorization for __builtin_prefetch #66160

Are you sure you want to change the base?

[Vectorize] Vectorization for __builtin_prefetch #66160

Conversation

m-saito-fj commented Sep 13, 2023

m-saito-fj commented Sep 13, 2023

m-saito-fj commented Oct 15, 2023

m-saito-fj commented Nov 29, 2023

rengolin left a comment

Choose a reason for hiding this comment

rengolin Nov 30, 2023

Choose a reason for hiding this comment

m-saito-fj Nov 30, 2023

Choose a reason for hiding this comment

rengolin Nov 30, 2023

Choose a reason for hiding this comment

m-saito-fj Dec 14, 2023 • edited

Choose a reason for hiding this comment

rengolin Nov 30, 2023

Choose a reason for hiding this comment

m-saito-fj Dec 14, 2023

Choose a reason for hiding this comment

rengolin Nov 30, 2023

Choose a reason for hiding this comment

m-saito-fj Dec 14, 2023

Choose a reason for hiding this comment

github-actions bot commented Nov 30, 2023 • edited

m-saito-fj commented Dec 18, 2023

m-saito-fj commented Jan 4, 2024

davemgreen commented Jan 21, 2024

m-saito-fj commented Jan 22, 2024

davemgreen commented Jan 22, 2024

m-saito-fj commented Jan 24, 2024

m-saito-fj Dec 14, 2023 •

edited

github-actions bot commented Nov 30, 2023 •

edited